In Silico Design of New Dual Inhibitors of SARS-CoV-2 MPRO through Ligand- and Structure-Based Methods

The viral main protease is one of the most attractive targets among all key enzymes involved in the life cycle of SARS-CoV-2. Considering its mechanism of action, both the catalytic and dimerization regions could represent crucial sites for modulating its activity. Dual-binding the SARS-CoV-2 main protease inhibitors could arrest the replication process of the virus by simultaneously preventing dimerization and proteolytic activity. To this aim, in the present work, we identified two series’ of small molecules with a significant affinity for SARS-CoV-2 MPRO, by a hybrid virtual screening protocol, combining ligand- and structure-based approaches with multivariate statistical analysis. The Biotarget Predictor Tool was used to filter a large in-house structural database and select a set of benzo[b]thiophene and benzo[b]furan derivatives. ADME properties were investigated, and induced fit docking studies were performed to confirm the DRUDIT prediction. Principal component analysis and docking protocol at the SARS-CoV-2 MPRO dimerization site enable the identification of compounds 1b,c,i,l and 2i,l as promising drug molecules, showing favorable dual binding site affinity on SARS-CoV-2 MPRO.

In an outbreak scenario, caused by the rapid global spread of SARS-CoV-2, the worldwide research community points to the urgency of developing an effective therapeutic strategy to support vaccination campaigns [7]. In this light, the "drug repurposing" approach represented a successful strategy, allowing to bypass the lengthy process of pharmacokinetic and toxicological clinical trials required for the approval of any new drugs [8][9][10][11].
Several viral proteins were investigated as possible SARS-CoV-2 druggable targets, with the Main Protease (M PRO ) emerging as one of the most attractive proteins [12,13].
The SARS-CoV-2 M PRO , also known as 3C-like protease (3CL PRO ), is a cysteine catalytic enzyme with a pivotal role in mediating viral replication and transcription. In particular, the viral polyproteins, which are released from translated RNA, are processed by the SARS-CoV-2 M PRO at 11 conserved sites generating 12 fragments (non-structural proteins, Figure 1. X-ray structure of dimeric SARS-CoV-2 M PRO (pdb code 6Y2F) [16]; the two monomers are highlighted in blue and grey.
According to the nomenclature introduced by Schechter and Berger, the amino acids of the viral polyproteins processed by proteases are indicated as -P3-P2-P1↓P1′-, where P2 tolerates more hydrophobic amino acid with a clear preference for leucine, P1′ is a nonconserved prime recognition site (a small amino acid, such as serine, glycine, or alanine), P1 is a glutamine, and the arrow ↓ represents the cleavage site between P1 and P1′ ( Figure  2) [13,17]. The M pro exclusively cleaves polypeptide sequences after a glutamine residue [18]. Since no human host-cell proteases are known with this substrate specificity, the main protease is a selective biological target for COVID-19 antiviral treatment.   [16]; the two monomers are highlighted in blue and grey.
According to the nomenclature introduced by Schechter and Berger, the amino acids of the viral polyproteins processed by proteases are indicated as -P3-P2-P1↓P1 -, where P2 tolerates more hydrophobic amino acid with a clear preference for leucine, P1 is a non-conserved prime recognition site (a small amino acid, such as serine, glycine, or alanine), P1 is a glutamine, and the arrow ↓ represents the cleavage site between P1 and P1 ( Figure 2) [13,17]. The M pro exclusively cleaves polypeptide sequences after a glutamine residue [18]. Since no human host-cell proteases are known with this substrate specificity, the main protease is a selective biological target for COVID-19 antiviral treatment. The SARS-CoV-2 M PRO , also known as 3C-like protease (3CL PRO ), is a cysteine catalytic enzyme with a pivotal role in mediating viral replication and transcription. In particular, the viral polyproteins, which are released from translated RNA, are processed by the SARS-CoV-2 M PRO at 11 conserved sites generating 12 fragments (non-structural proteins, NPSs) [14]. The M PRO is characterized by the dimerization of two monomers (A and B). Each monomer consists of three domains connected by long loop regions. Domain I (8-101) comprises six β-strands and one α-helix, while domain II (102-184) and domain III (201-303) include six β-strands and five α-helices, respectively ( Figure 1). The binding pocket is divided into four subsites, S1′, S1, S2, and S3/S4: S1′ represents the catalytic site and is located between domains I and II, involving the Cys 145 /His 41 catalytic dyad; S1 is characterized by the side chains of Phe 140 , Asn 142 , His 163 , His 164 , Glu 166 , and His 172 ; S2 consists of hydrophobic amino acids, such as Met 49 , Tyr 54 , Met 165 , Pro 168 , Val 186 ; finally S3/S4, including residues Gln 189 , Ala 191 , Gln 192 , Gly 251 , is particularly exposed to the solvent [15]. According to the nomenclature introduced by Schechter and Berger, the amino acids of the viral polyproteins processed by proteases are indicated as -P3-P2-P1↓P1′-, where P2 tolerates more hydrophobic amino acid with a clear preference for leucine, P1′ is a nonconserved prime recognition site (a small amino acid, such as serine, glycine, or alanine), P1 is a glutamine, and the arrow ↓ represents the cleavage site between P1 and P1′ ( Figure  2) [13,17]. The M pro exclusively cleaves polypeptide sequences after a glutamine residue [18]. Since no human host-cell proteases are known with this substrate specificity, the main protease is a selective biological target for COVID-19 antiviral treatment.  A detailed structural analysis of the M PRO protein revealed the presence of a potential allosteric site between the domains II and III, which is involved in the protease dimerization, Figure 1 [19][20][21]. In particular, the N-finger residues Ser 1 -Gly 11 , residue Asn 214 , the region

Results and Discussion
Virtual screening is a reliable approach in the search for candidates able to target SARS-CoV-2. Among all the structural/non-structural viral proteins of SARS-CoV-2, M PRO proved to be the most attractive target for the development of selective antiviral drugs. In the present work, we propose in silico studies to analyze the SARS-CoV-2 M PRO inhibition activity of small molecules with the aim of identifying new promising dual binding site modulators of SARS-CoV-2 M PRO . Figure 3 shows the flowchart of the in silico mixed ligand-structure screening for the identification of new SARS-CoV-2 M PRO inhibitors. The computational protocol was based on a hierarchical workflow with an initial phase of filtering using a ligand-based approach focused on DRUDIT ONLINE (DRUg DIscovery Tools, open access web-service, www.drudit.com, accessed on 20 March 2023) [42], a free online resource based on the calculation of molecular descriptors by MOLDESTO (MOLecular DEScriptors TOol) [42]. The use of DRUDIT Biotarget Predictor Tool (BPT) allowed the selection of two clusters of small molecules already known in the literature for their antitumor activity [43,44] and characterized by a benzo[b]thiophene or a benzo[b]furan scaffold. Absorption, Distribution, tool, a robust predictive model to support biocompatible drug discovery projects [45]. SwissADME web tool, a robust predictive model to support biocompatible drug discovery projects [45].
Then, to improve the in silico affinity results, the 24 selected compounds were investigated by structure-based molecular docking studies at the catalytic binding site.
The matrix of molecular descriptors for the 24 molecules, obtained through the DRUDIT web-service, was merged with the sequence of molecular descriptors of the noncovalent allosteric inhibitor pelitinib, to perform a multivariate analysis. Finally, the resulting 13 benzothiophene and benzofuran compounds were analyzed using structurebased molecular docking protocols at the dimerization binding site.

In Silico Ligand-Based Approach: DRUDIT ONLINE
To perform the in silico ligand-based approach, we used the DRUDIT Biotarget Predictor Tool (BPT), which allows the prediction of the affinity of input structures to a selected biological target [42].
The first step required the construction of a template for the SARS-CoV-2 M PRO binding site, following the method recently described in the literature [46]. Many well-known drugs were used to perform molecular docking studies and evaluate their ability to bind to the catalytic site of the SARS-CoV-2 M PRO . The resulting best scored molecules were used to build the SARS-CoV-2 M PRO template, which was implemented in DRUDIT. Subsequently, an in-house structure database of approximately 10,000 heterocyclic structures was uploaded to the web-service DRUDIT PBT. Standard parameters (N = 500, Z = 50, G = a) [46] were used, and the output data were ranked according to the Drudit Affinity Score (DAS), a parameter whose value (in the range 0/1, low/high affinity), reflects the capability of compounds to bind into the SARS-CoV-2 M PRO catalytic site. Then, to improve the in silico affinity results, the 24 selected compounds were investigated by structure-based molecular docking studies at the catalytic binding site.
The matrix of molecular descriptors for the 24 molecules, obtained through the DRU-DIT web-service, was merged with the sequence of molecular descriptors of the noncovalent allosteric inhibitor pelitinib, to perform a multivariate analysis. Finally, the resulting 13 benzothiophene and benzofuran compounds were analyzed using structure-based molecular docking protocols at the dimerization binding site.

In Silico Ligand-Based Approach: DRUDIT ONLINE
To perform the in silico ligand-based approach, we used the DRUDIT Biotarget Predictor Tool (BPT), which allows the prediction of the affinity of input structures to a selected biological target [42].
The first step required the construction of a template for the SARS-CoV-2 M PRO binding site, following the method recently described in the literature [46]. Many well-known drugs were used to perform molecular docking studies and evaluate their ability to bind to the catalytic site of the SARS-CoV-2 M PRO . The resulting best scored molecules were used to build the SARS-CoV-2 M PRO template, which was implemented in DRUDIT. Subsequently, an in-house structure database of approximately 10,000 heterocyclic structures was uploaded to the web-service DRUDIT PBT. Standard parameters (N = 500, Z = 50, G = a) [46] were used, and the output data were ranked according to the Drudit Affinity Score (DAS), a parameter whose value (in the range 0/1, low/high affinity), reflects the capability of compounds to bind into the SARS-CoV-2 M PRO catalytic site.

ADME Properties
The 24 selected molecules were submitted to the SwissADME web-tools (http://www. swissadme.ch, accessed on 20 March 2023) [45] considering a set of well consolidated parameters for searching bioactive compounds, such as PAINS filters [47], Lipinski's rules [48], Veber [49], and Egan filters [50]. The analysis of the data showed in Table 2 highlighted that the benzo[b]thiophene and benzo[b]furan compounds generally met the expectations in terms of bioactivity. Thirteen of the twenty-four structures have no violations, and all compounds have no PAINS. In light of these considerations, no compounds were excluded for the in silico structure-based analysis. Table 2. Drug-likeness parameters calculated for the selected 24 compounds.

Compound
Lipinski Violations

PAINS Alerts
Total All the parameters calculated through SwissADME are present in the Table S1, Supplementary Materials.

In Silico Structure-Based Studies: Molecular Docking at the Catalytic Site of SARS-CoV-2 M PRO
Induced Fit Docking (IFD) studies were performed to validate the obtained ligandbased data and to gain insight into the structural features of ligand/SARS-CoV-2 M PRO (pdb code 7VH8 [51]) complexes, analyzing the mutual conformational changes between ligands and proteins.
We focused the docking grid on the SARS-CoV-2 M PRO binding pocket, including the four subsites S1 , S1, S2, S3/S4 as described in the Section 3. Figure 5b shows the 3D active binding site of SARS-CoV-2 M PRO in covalently bonding with nirmatrelvir (PF-07321332, 2D structure in Figure 5a), a second-generation orally available protease inhibitor currently in phase three clinical trials in combination with ritonavir (PAXLOVID ® , see ClinicalTrials.gov identifier: NCT04960202).
The IFD studies aim to confirm the DRUDIT prediction and the capability of ethyl 3-benzoylamino-5-[(1H-imidazol-4-yl-methyl)-amino]-benzo[b]thiophene-2-carboxylates of type 1 and ethyl 3-benzoylamino-5-[(1H-imidazol-4-yl-methyl)-amino]-benzo[b]furan-2carboxylates of type 2 to effectively interact with the selected target binding site. Table 3 shows the IFD and docking scores of the selected 24 structures 1a-l and 2a-l, and the reference ligand nirmatrelvir (PF-07321332) [51]. The IFD studies aim to confirm the DRUDIT prediction and the capability of ethyl 3benzoylamino furan-2carboxylates of type 2 to effectively interact with the selected target binding site. Table 3 shows the IFD and docking scores of the selected 24 structures 1a-l and 2a-l, and the reference ligand nirmatrelvir (PF-07321332) [51]. Table 3. IFD and docking output results for 1a-l and 2a-l and the co-crystallized ligand nirmatrelvir (pdb code 7VH8) [51].   The IFD analysis confirmed the biotarget affinity results and identified the benzo[b]thiophenes 1a-d,f,i-l and benzo[b]furans 2h,i,l as the most promising competitive inhibitors (IFD score range from −675.768 to −673.730) of the SARS-CoV-2 M PRO catalytic binding site, with higher IFD scores than nirmatrelvir (IFD score −673.142) (Table 3). Table 4 provides the overview of the amino acids involved in the binding with the 12 highest scoring compounds. The labelled residues were highlighted by the analysis of 2D and 3D ligand pose maps at a distance of 3 Å.  , compounds 1a-d,f,i,k,l  and 2h,i,l establish more interactions than nirmatrelvir with the conserved amino acids of the four SARS-CoV-2 M PRO sub-regions S1, S1 , S2, and S3/S4, suggesting an improved affinity of the benzo[b]thiophene and benzo[b]furan compounds for the catalytic binding site and resulting in more stable ligand/protein complexes.

SARS-CoV
As shown in Figure 6b,d, the binding pocket exhibits suitable properties for ligands 1d and 2l, which are the two highest IFD scoring compounds (2D structures shown in Figure 6a,c). Both compounds interact with the SARS-CoV-2 M PRO binding site by extending all substituents into the four subsites S1 , S1, S2, S3/S4, and creating a network of key reversible hydrogen bonds with the amino acids: His 41    Among the ligands with higher IFD scores, the analysis of the binding poses of derivatives 1b, 1c, 1l and 2l, shown in Figure 7, highlights a remarkable overlap of poses, indicating a redundancy in the position of the key elements of the small molecules within the four sub-pockets, and forming a large number of interactions. The imidazole moiety, charged at physiological pH, is capable to penetrate deeply into the S1 and S3/S4 subregions (Asn 142 , Gly 143 , His 163 , His 164 , Glu 166 ) and stabilize itself by forming hydrogen interactions with the side chains and/or the backbone of residues, that are particularly exposed to the solvent in the S3/S4 site. Probably, this portion could mimic the Gln residue of the natural substrates, similarly to the five membered γ-lactamic ring, which is the most recurrent fragment of selective inhibitors at the catalytic binding site, reported in the literature [13].   The carboxyethyl moiety is stabilized in the S1 pocket, instead of the carboxyamide moiety, simulating a peptide bond of natural substrate which creates several interactions with the side chains of His 41 , Asn 142 and Glu 166 , in both S1 and S1 pockets. The substituted phenyl rings are arranged in the region adjacent to the S1 cleft.

Statistical Analysis: Principal Component Analysis (PCA)
The structural features of the selected compounds of types 1 and 2 prompted us to evaluate them also as potential binders for the allosteric dimerization site of the SARS-CoV-2 M PRO , as non-competitive inhibitors. Similarly to the known dimerization site inhibitor pelitinib, the benzo[b]thiophene and benzo[b]furan compounds have a series of aromatic rings, linked by rotatable bonds.
To investigate the potential inhibitory activity of the selected compounds at the dimerization site of SARS-CoV-2 M PRO , we performed a Principal Component Analysis (PCA), including the molecular descriptor matrix of the 24 compounds, obtained from DRU-DIT ligand-based studies, and the molecular descriptors of the non-covalent allosteric inhibitor pelitinib.
The application of PCA to the matrix of structures versus molecular descriptors (Supplementary Materials, Matrix S1), showed a total variance of 50% expressed by the first two components. The bidimensional plot (PC1 versus PC2, Figure 8) shows the 2D arrangement of the molecules in the graph compared with the reference ligand pelitinib.
hibitor pelitinib, the benzo[b]thiophene and benzo[b]furan compounds have a series of aromatic rings, linked by rotatable bonds.
To investigate the potential inhibitory activity of the selected compounds at the dimerization site of SARS-CoV-2 M PRO , we performed a Principal Component Analysis (PCA), including the molecular descriptor matrix of the 24 compounds, obtained from DRUDIT ligand-based studies, and the molecular descriptors of the non-covalent allosteric inhibitor pelitinib.
The application of PCA to the matrix of structures versus molecular descriptors (Supplementary Materials, Matrix S1), showed a total variance of 50% expressed by the first two components. The bidimensional plot (PC1 versus PC2, Figure 8) shows the 2D arrangement of the molecules in the graph compared with the reference ligand pelitinib. Based on the distribution of the compounds in this graph, the molecules in the proximity of pelitinib were identified as new inhibitors potentially capable of modulating the dimerization process. The distances of the molecules from the reference pelitinib were calculated and the data were listed in Table 5, which provides cartesian coordinates for the derivatives in the circle (Figure 8), (see Table S2, Supplementary Materials, for all coordinates). Benzo[b]thiophenes 1b,c,g-i,l and benzo[b]furans 2a-c,g-i,l were selected to be further investigated with structure-based studies, considering the dimerization region of SARS-CoV-2 M PRO as allosteric binding site. Based on the distribution of the compounds in this graph, the molecules in the proximity of pelitinib were identified as new inhibitors potentially capable of modulating the dimerization process. The distances of the molecules from the reference pelitinib were calculated and the data were listed in Table 5, which provides cartesian coordinates for the derivatives in the circle (Figure 8), (see Table S2, Supplementary Materials, for all coordinates). Benzo[b]thiophenes 1b,c,g-i,l and benzo[b]furans 2a-c,g-i,l were selected to be further investigated with structure-based studies, considering the dimerization region of SARS-CoV-2 M PRO as allosteric binding site.

In Silico Structure-Based Studies: Induced Fit Docking (IFD) into the Allosteric Site of SARS-CoV-2 M PRO
Following the statistical analyses, IFD simulations performed by fixing the docking grid on the SARS-CoV-2 M PRO allosteric site on pdb code 7AXM [41] were performed. In Figure 9b, the crystallographic structure of SARS-CoV-2 M PRO complexed with noncovalent allosteric inhibitor pelitinib is shown (2D structure in Figure 9a).

In Silico Structure-Based Studies: Induced Fit Docking (IFD) into the Allosteric Site of SARS-CoV-2 M PRO
Following the statistical analyses, IFD simulations performed by fixing the docking grid on the SARS-CoV-2 M PRO allosteric site on pdb code 7AXM [41] were performed. In Figure 9b, the crystallographic structure of SARS-CoV-2 M PRO complexed with non-covalent allosteric inhibitor pelitinib is shown (2D structure in Figure 9a). The analysis of the results shows compounds with higher affinity than pelitinib. In Table 6  Considering the IFD results for both catalytic and dimerization sites, compounds 1b, c,i,l and 2i,l were found to have interesting IFD scores, suggesting a dual binding site inhibitory activity of SARS-CoV-2 M PRO .
Analysis of the amino acid maps (Table 7)  The analysis of the results shows compounds with higher affinity than pelitinib. In Table 6   Considering the IFD results for both catalytic and dimerization sites, compounds 1b, c,i,l and 2i,l were found to have interesting IFD scores, suggesting a dual binding site inhibitory activity of SARS-CoV-2 M PRO .
Analysis of the amino acid maps (Table 7) The selected compounds establish hydrogen bonds with both the residues of the N-finger region and the region around residues Arg 298 , Cys 300 , Ser 301 . The allosteric ligand/protein complex in the pocket between the domains II and III could interfere with the necessary interactions between the two monomers and prevent the dimerization process with the consequent inactivation of the SARS-CoV-2 M PRO .
In Figure 10, compounds 1c and 1l are shown in a recurring position, fitting in the SARS-CoV-2 M PRO allosteric site (dimerization domain), in which there are highly conserved with essential amino acids, such as N-finger residues (Ser 1 , Gly 2 , Phe 3 , Arg 4

Molecular Dynamic Simulation
Molecular dynamic simulations were performed to investigate the stability and dynamics of the SARS-CoV-2 M PRO in complex with 1d (catalytic site, 7VH8) and 1c (allosteric site, 7AXM). The response was studied in terms of protein and ligand binding energy, demonstrating high stability across the simulation time (15 and 20 ns for 1d and 1c, respectively) and reaching a plateau energy (time-energy graphs for both complexes are available in Figures S1 and S2, Supplementary Materials). This further analysis allowed us to confirm the robustness of the in silico protocol.

Ligand-Based Studies
The web-service DRUDIT 1.0 (www.drudit.com, accessed on 20 March 2023) operates through four servers, each of which can perform more than ten jobs simultaneously, and several software modules implemented in C and JAVA running on MacOS Mojave. The Biotarget Finder Module was used to screen small molecule drug candidates as SARS-CoV-2 M PRO inhibitors [42].

Biotarget Predictor Tool (BPT)
The tool provides prediction of the binding affinity between candidate molecules and the specified biological target. The template of SARS-CoV-2 M PRO was created using a set of well-known drugs to perform molecular docking studies at catalytic site of SARS-CoV-2 M PRO . It was uploaded in DRUDIT, and the default DRUDIT parameters (N = 500, Z = 50, G = a) were used [42,46]. In accordance with the first phase of the in silico workflow, the database was uploaded in DRUDIT and submitted to the Biotarget Predictor. The output results were obtained as DAS (Drudit Affinity Scores) for each structure, reflecting the binding affinity of compounds against the SARS-CoV-2 M PRO catalytic site. Figure 10. Three-dimensional overlaps of compounds 1c, and 1l at the SARS-CoV-2 M PRO allosteric site (pdb code 7AXM) [41]; nitrogen, oxygen, fluorine, chlorine, and sulfur atoms are in blue, red, light green, dark green, and yellow, respectively.

Molecular Dynamic Simulation
Molecular dynamic simulations were performed to investigate the stability and dynamics of the SARS-CoV-2 M PRO in complex with 1d (catalytic site, 7VH8) and 1c (allosteric site, 7AXM). The response was studied in terms of protein and ligand binding energy, demonstrating high stability across the simulation time (15 and 20 ns for 1d and 1c, respectively) and reaching a plateau energy (time-energy graphs for both complexes are available in Figures S1 and S2, Supplementary Materials). This further analysis allowed us to confirm the robustness of the in silico protocol.

Ligand-Based Studies
The web-service DRUDIT 1.0 (www.drudit.com, accessed on 20 March 2023) operates through four servers, each of which can perform more than ten jobs simultaneously, and several software modules implemented in C and JAVA running on MacOS Mojave. The Biotarget Finder Module was used to screen small molecule drug candidates as SARS-CoV-2 M PRO inhibitors [42].

Biotarget Predictor Tool (BPT)
The tool provides prediction of the binding affinity between candidate molecules and the specified biological target. The template of SARS-CoV-2 M PRO was created using a set of well-known drugs to perform molecular docking studies at catalytic site of SARS-CoV-2 M PRO . It was uploaded in DRUDIT, and the default DRUDIT parameters (N = 500, Z = 50, G = a) were used [42,46]. In accordance with the first phase of the in silico workflow, the database was uploaded in DRUDIT and submitted to the Biotarget Predictor. The output results were obtained as DAS (Drudit Affinity Scores) for each structure, reflecting the binding affinity of compounds against the SARS-CoV-2 M PRO catalytic site.

Structure-Based Studies
The preparation process of ligands and protein-ligand complexes used for in silico studies was performed as detailed below:

Ligand Preparation
The ligands for docking were prepared through the LigPrep tool, available in the Maestro Suite 2022, Schrödinger software [52]. For each ligand, all possible tautomers and stereoisomers were generated for a pH of 7.0 ± 0.4, using default setting, through the Epik ionization method [53]. Consequently, the integrated Optimized Potentials for Liquid Simulations (OPLS) 2005 force field was used to minimize the energy status of the ligands [54].

Protein Preparations
The two crystal structures of SARS-CoV-2 M PRO (pdb codes 7VH8 [51], 7AXM [41]) were downloaded from the Protein Data Bank [55,56]. As regard the pdb code 7VH8, a first breakup of the covalent bound between the cocrystal ligand and the Cys 145 was carried out.
Successively, both the two protein structures were prepared using the Protein Preparation Wizard, in the Schrödinger software, with the default setting [57]. In detail, bond orders were assigned, including Het group, hydrogen atoms were added, all water molecules were delated, and protonation of the heteroatom states were carried out using the Epik-tool (with the pH set at biologically relevant values, i.e., at 7.0 ± 0.4). The H-bond network was then optimized. The structure was finally subjected to a restrained energy minimization step (RMSD of the atom displacement for terminating the minimization was 0.3 Å), using the OPLS 2005 force field [54].

Docking Validation
Molecular docking studies were executed and scored by using the Glide module, available in the Schrödinger Suite program package. The receptor grids were obtained through assignment the original ligands (PF-07321332 and pelitinib for pdb codes 7VH8 [51] and 7AXM [41], respectively) as the centroid of the grid boxes. Extra Precision (XP) mode, as scoring function, was used to dock the generated 3D conformers into the receptor model. The post-docking minimization step was performed with a total of five poses for each ligand conformer, and a maximum of two docking poses were generated per ligand conformer. The proposed docking procedure was able to re-dock the original ligands within the receptor-binding pockets with RMSD 0.51 Å. Table 8 shows all the parameters combinations used for RMDS value optimization for both the proteins. In detail, radii Van der Waals scaling allows us to temporarily remove active-site residue side chains. By default, the scaling factor is 0.50 for the receptor and 0.50 for the ligand, with a partial charge threshold of 0.15. Removing the side chains from active site residues provides more room for ligand docking, so the receptor does not need to be quite as soft. The side chains are restored after docking.
The side chain optimization makes possible to reduce the distance from the ligand that defines residues for refinement. In general, the optimal value for this parameter is set to 5.0 Å by default, ensuring that the optimal setting for side chains is selected.
The energy minimization parameters controls the minimization protocol through the distance-dependent dielectric constant (the optimum is to set the protein/ligand dielectric constants to values of 1-2, default setting at 2) and the maximum number of minimization steps (by default fixed to 100) [54]. Induced fit docking simulation was performed using the IFD application, an accurate and robust Schrödinger technology that accounts for both ligand and receptor flexibility [58,59]. Schrödinger's induced fit docking validated protocol was applied by using the SARS-CoV-2 M PRO protein from the PDB (pdb codes 7VH8 [51] and 7AXM [41]), previously refined by the Protein Preparation module. The IFD score (IFD score = 1.0 Glide Gscore + 0.05 Prime Energy), which includes protein-ligand interaction energy and system total energy, was calculated and used to rank the IFD poses. The more positive in modulus the IFD score, the more favorable the binding.

Molecular Dynamic Simulation
Molecular dynamics simulations were performed using the MacroModel task available in Maestro Schrodinger for the top two best low binding energy ligand-protein complexes after docking: SARS-CoV-2 M PRO (catalytic site, 7VH8) with 1d and SARS-CoV-2 M PRO (allosteric site, 7AXM) with 1c. The OPLS-2005 force field was used to model the proteins and ligands and the systems were energy minimized for 1000 steps before a production run of 15 and 20 ns, respectively. The systems temperature was maintained around 300 K. The results were analyzed in terms of protein and ligand time-lapse binding energy.

Principal Component Analysis
PCA, one of the most-widely used multivariate exploratory techniques, enables a drastic dimensionality reduction of an original raw data, transforming the original matrix to a new one, whose set of variables, termed as Principal Components (PCs), appear to be ordered with descending importance in terms of variance. Principal Components Analysis can be highly useful for data classification and pattern recognition. In this work, DRUDIT was used to obtain the original matrix of objects versus variables (Supplementary Materials, Matrix S1), and free version TIBCO Statistica ® software was used to perform principal component analysis.
Grubb's test, also known as ESD method (extreme studentized deviate), was performed. In detail, the outliers were determined by singularly evaluating those compounds outside the red circle ( Figure 8) in comparison with the cluster of molecules closest to pelitinib. The identified outliers were not included in the next step of the virtual screening.

Conclusions
The SARS-CoV-2 M PRO is a key cysteinyl enzyme for virus replication. Considering its mechanism of action, which consists of a preliminary dimerization with consequent activation of its catalytic site, small molecules capable of binding and interfering with both the dimerization and the proteolytic process could represent an interesting and more efficacious class of therapeutics, with a dual binding activity. With this aim, in silico approaches are reliable strategies in the search for candidates in drug discovery projects.
Here we identified a set of ethyl 3-benzoylamino-5-[(1H-imidazol-4-yl-methyl)-amino]benzo[b]thiophene-2-carboxylates 1 and ethyl 3-benzoylamino-5-[(1H-imidazol-4-yl-methyl)amino]-benzo[b]furan-2-carboxylates 2 as potential SARS-CoV-2 M PRO inhibitors, through a hierarchical and hybrid virtual screening. In detail, the Biotarget Predictor ligand-based Tool, available in the open-access web-platform DRUDIT ONLINE (www.drudit.com, accessed on 20 March 2023) allows for the filtering of a large in-house structure database, identifying the aforementioned set of small molecules with high affinity against the SARS-CoV-2 M PRO catalytic binding site. ADME properties of the selected compounds were investigated through the SwissADME tool (www.swissadme.ch, accessed on 20 March 2023) and induced fit docking studies were performed on the catalytic site to confirm DRUDIT prediction. Moreover, aiming at evaluating the possibility of a dual binding mechanism of action, the identified hits were further investigated by means of principal component analysis and IFD into the dimerization site. By combining these results, compounds 1b,c,i,l and 2i,l, with favorable ADME properties (drug-likeliness, lead-likeliness, no PAINS), showed the capability to strongly bind both to the catalytic and allosteric SARS-CoV-2 M PRO sites suggesting a potential dual activity. In general, considering the high amino acids conservation rate in both pockets, the potential drugs proposed here might even be effective against mutation variants and other coronaviruses.
In silico simulations of a complex system are becoming increasingly popular in the drug development process and across clinical research. The use of computational models and simulations offers significant advantages over human-based clinical trials in both operational factors and therapeutic outcomes, providing the tools to evaluate various treatments qualitatively and quantitatively on specific diseases, and offering more practical and economical experiments.
Even if multiple in silico virtual screenings on the SARS-CoV-2 M PRO have been reported in literature in the past three years, our computational workflow is peculiar. In particular, we took the advantages of our in-house ligand based Biotarget Predictor Tool (BPT) which allowed us to screen an enormous ligands library in negligible computational time and with no need for particularly high-performance hardware. This tool, integrated with both structure-based techniques (molecular docking and dynamics) and, interestingly, multivariate statistical analysis, was applied to evaluate a potential dual binding activity on SARS-CoV-2 M PRO , which has been rarely explored in other virtual screening research.